A Linear-Time Multivariate Micro-aggregation for Privacy Protection in Uniform Very Large Data Sets
نویسندگان
چکیده
Optimally micro-aggregating a multivariate data set is known to be NP-hard, thus, heuristic approaches are used to cope with this privacy preserving problem. Unfortunately, algorithms in the literature are computationally costly, and this prevents using them on large data sets. We propose a partitioning algorithm to micro-aggregate uniform very large data sets with cost O(n). We provide the mathematical foundations proving the efficiency of our algorithm and we show that the error associated to microaggregation is bounded and decreases when the number of micro-aggregated records grows. The experimental results confirm the prediction of the mathematical analysis. In addition, we provide a comparison between our proposal and MDAV, a well-known micro-aggregation algorithm with cost O(n2).
منابع مشابه
Micro-SOM: A Linear-Time Multivariate Microaggregation Algorithm Based on Self-Organizing Maps
The protection of personal privacy is paramount, and consequently many efforts have been devoted to the study of data protection techniques. Governments, statistical agencies and corporations must protect the privacy of the individuals while guaranteeing the right of the society to knowledge. Microaggregation is one of the most promising solutions to deal with this praiseworthy task. However, i...
متن کاملEvaluating the Potential of Differential Privacy Mechanisms for Census Data
Despite its undeniable attractiveness as the only data protection mechanism with formal privacy guarantees, the concept of differential privacy has been repeatedly criticized because of the deteriorating effects of currently available differential privacy mechanisms. Due to the strong assumptions regarding the knowledge of a potential data intruder, the amount of noise that needs to be added to...
متن کاملImproved Univariate Microaggregation for Integer Values
Privacy issues during data publishing is an increasing concern of involved entities. The problem is addressed in the field of statistical disclosure control with the aim of producing protected datasets that are also useful for interested end users such as government agencies and research communities. The problem of producing useful protected datasets is addressed in multiple computational priva...
متن کاملAnalyzing Tools and Algorithms for Privacy Protection and Data Security in Social Networks
The purpose of this research, is to study factors influencing privacy concerns about data security and protection on social network sites and its’ influence on self-disclosure. 100 articles about privacy protection, data security, information disclosure and Information leakage on social networks were studied. Models and algorithms types and their repetition in articles have been distinguished a...
متن کاملTop-Coding and Public Use Microdata Samples from the U.S. Census Bureau
The US Census Bureau regularly releases Public Use Microdata Samples (PUMS), data files which contain de-identified subsets of the data provided by respondents to some of its various surveys and to the Decennial Census itself. This allows data users to perform “micro” -analyses rather than the “macro” -tabulations which are regularly performed by the Bureau. These data users range from non-gove...
متن کامل